library(topGO)
library(knitr)   # to use kable()
library(limma)   # to use vennDiagram()
library(ggplot2)
library(variancePartition)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
                   isoforms = '../2019-07-26/transcripts/annotation.txt')
DGEDIR <- '../2020-01-08/'
ENRICHMENT <- paste('../2020-01-14/Enrichment', TAG, VAR, 'RData', sep='.')

Introduction

This is the enrichment analysis for genes regulated by regime. I am using the topGO package (Alexa and Rahnenfuhrer 2019), which is able to apply different algorithms. The starting point is an ordering of genes usually according to their association with a phenotype or any other relevant quantity, such as a measure of differential expression between two conditions. In this case, we order genes by their differential expression between regime levels. Following one of the suggestions in the manual, I originally ordered genes by either \(p\) value of the differential expression analysis, or by the (complement of the) amount of its expression variance explained by regime (see 2020-01-14). Those were quite equivalent orderings. However, they do not distinguish the sign of the fold change: both up- and down-regulated genes contribute to make a gene ontology term significant. It is one of the approaches recommended, and it makes sense, to the extent that among all the genes annotated with a function, some may display contrasting expression patterns, for example if both positive and negative regulators of that function are annotated with the same label. I thus, expect the use of \(p\) values in enrichment analysis to facilitate the detection of high level categories.

In contrast, genes sharing lower level categories (very specific GO terms) are more likely to be expressed in a similar way. Then, an ordering of genes that reflect the sign of the difference in expression levels between the two conditions compared would be more informative. In principle, enrichment methods should be able to detect an enrichment in either end of the list, rather than only in the top. That is actually the case for tests based on the Kolmogorov-Smirnov distribution, like GSEA (Subramanian et al., n.d.). I actually checked that reversing the order of genes produces the exact same results, as expected. Below, I use the Kolmogorov-Smirnov test.

For gene ordering, I use the \(t\) statistic of the differential expression analysis, which is positive when expression is higher in the random than in the regular regime, and vice versa.

Reading the data

Functional annotation is in 2019-07-26. I will also upload two lists of genes, with either proportion of variance explained by regime or p-value of differential expression test.

load(paste0(DGEDIR, TAG, '.RData'))
tagTStat    <- topTable(fitmm, coef = VAR, num = length(fitmm$F), sort.by='t', resort.by='t')[,'t',drop=FALSE]
annotation  <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))

To initialize the topGOdata object, I need the gene list as a named numeric vector, where the names are the genes identifiers and the numeric values, the \(t\) statistics from the differential expression analysis. The structure() function adds attributes to an object.

TStats <- structure(tagTStat[,1], names = row.names(tagTStat))
rm(tagTStat)

There are 18598 genes scored with a \(t\) statistic. The annotation data frame has more than one GO term for every tag, separated by |. I need a named list.

head(annotation)
##       tagname                                                           goterms
## 1 XLOC_000002                                  GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009                                  GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010                       GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021                       GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036                                  GO:0005272|GO:0006814|GO:0016020
allgenes2GO <- strsplit(as.character(annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- annotation$tagname
rm(annotation)

There are 9691 genes with GO annotations. But the differential expression analysis includes many more genes. In order to include the not-annotated genes in the enrichment analysis, to see if annotated or not annotated genes are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.

for (tag in names(TStats)) {
   if (! tag %in% names(allgenes2GO)) {
      allgenes2GO <- append(allgenes2GO,
         structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
   }
}

Building the topGO object

Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.

I generate three datasets, to analyse each of the three ontologies.

selection <- function(allScores) {return(abs(allScores) > 5.0)}
GOdata.BP <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'BP', 
   allGenes = TStats,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.MF <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'MF',
   allGenes = TStats,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.CC <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'CC',
   allGenes = TStats,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
   Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
   Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
ontology Num_Genes Num_GO_terms
BP 15162 1154
MF 17429 576
CC 12940 291

Running the tests

There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.

BP.elim     <- runTest(GOdata.BP, algorithm = "elim",     statistic = "ks")
BP.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.lea      <- runTest(GOdata.BP, algorithm = "lea",      statistic = "ks")
MF.elim     <- runTest(GOdata.MF, algorithm = "elim",     statistic = "ks")
MF.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.lea      <- runTest(GOdata.MF, algorithm = "lea",      statistic = "ks")
CC.elim     <- runTest(GOdata.CC, algorithm = "elim",     statistic = "ks")
CC.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.lea      <- runTest(GOdata.CC, algorithm = "lea",      statistic = "ks")

ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
   algorithm = rep(c("elim", "weight01", "lea"), 3),
   TermsTested = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) length(score(X))),
   Significant = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) sum(score(X) < 0.01)))

kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
ontology algorithm TermsTested Significant
BP elim 1154 31
BP weight01 1154 32
BP lea 1154 46
MF elim 576 47
MF weight01 576 42
MF lea 576 68
CC elim 291 17
CC weight01 291 17
CC lea 291 30
rm(ResultsSummary)

Results

I don’t see a way to distinguish GO terms that are significant because they tend to appear near the top of the list of genes from those that are significant because they tend to appear near the bottom. The density plots are useful, but take too much space to show them for all terms.

Note that not all terms listed in the tables below are significant. The scores for the three methods (elim, weight01 and lea) are non-corrected p-values.

Biological process

orderedTerms <- names(sort(score(BP.weight01)))
significant.weight01 <- score(BP.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(BP.lea)[orderedTerms] <= 0.01
significant.elim     <- score(BP.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.tstat.sigTerms <- sigTerms

BP.all <- GenTable(GOdata.BP, elim=BP.elim, weight01=BP.weight01, lea=BP.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   BP.all[BP.all$Significant > BP.all$Expected,],
   caption = "Most over-represented terms of the Biological Process ontology.")
Most over-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
2 GO:0006508 proteolysis 442 10 6.24 3 8.1e-10 8.4e-19 8.1e-10
4 GO:0006030 chitin metabolic process 74 4 1.04 4 1.2e-09 3.0e-09 1.2e-09
6 GO:0034220 ion transmembrane transport 147 3 2.07 10 2.6e-05 1.9e-07 2.2e-08
7 GO:0055085 transmembrane transport 500 8 7.06 7 1.2e-06 4.2e-07 6.9e-13
9 GO:0005975 carbohydrate metabolic process 202 10 2.85 18 0.00121 3.3e-06 0.00024
15 GO:0007160 cell-matrix adhesion 16 1 0.23 14 0.00020 0.00020 0.00020
19 GO:0005992 trehalose biosynthetic process 6 2 0.08 17 0.00110 0.00110 0.00110
22 GO:0007017 microtubule-based process 195 4 2.75 148 0.09189 0.00189 2.5e-06
23 GO:0006629 lipid metabolic process 195 4 2.75 678 0.82452 0.00228 0.82452
25 GO:0006520 cellular amino acid metabolic process 117 3 1.65 286 0.26064 0.00374 0.32073
kable(
   BP.all[BP.all$Significant < BP.all$Expected,],
   caption = "Most under-represented terms of the Biological Process ontology.")
Most under-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0007186 G protein-coupled receptor signaling pat… 206 2 2.91 1 1.3e-26 4.4e-27 1.3e-26
3 GO:0006814 sodium ion transport 88 0 1.24 2 5.8e-10 5.8e-10 5.8e-10
5 GO:0006813 potassium ion transport 56 0 0.79 5 1.4e-07 1.4e-07 8.8e-10
8 GO:0006486 protein glycosylation 75 0 1.06 11 0.00011 6.6e-07 0.00011
10 GO:0060271 cilium assembly 46 0 0.65 6 9.5e-07 7.6e-06 1.4e-08
11 GO:0015074 DNA integration 34 0 0.48 8 1.5e-05 1.5e-05 1.5e-05
12 GO:0006811 ion transport 436 5 6.15 12 0.00014 2.6e-05 7.0e-19
13 GO:0007165 signal transduction 544 7 7.68 23 0.00698 0.00013 0.00698
14 GO:0007156 homophilic cell adhesion via plasma memb… 21 0 0.30 13 0.00015 0.00015 0.00015
16 GO:0003341 cilium movement 10 0 0.14 15 0.00026 0.00026 0.00026
17 GO:0006355 regulation of transcription, DNA-templat… 355 2 5.01 176 0.12252 0.00060 0.12252
18 GO:0070588 calcium ion transmembrane transport 30 0 0.42 16 0.00101 0.00101 0.00101
20 GO:0007154 cell communication 565 7 7.97 9 1.9e-05 0.00132 0.00016
21 GO:0048870 cell motility 14 0 0.20 22 0.00581 0.00158 0.00581
24 GO:0007018 microtubule-based movement 115 1 1.62 20 0.00303 0.00364 6.4e-05
26 GO:0007166 cell surface receptor signaling pathway 72 1 1.02 293 0.26315 0.00387 0.26315
27 GO:0006820 anion transport 81 1 1.14 74 0.02458 0.00536 0.02458
28 GO:0022607 cellular component assembly 217 0 3.06 986 0.97883 0.00539 0.97305
29 GO:0071805 potassium ion transmembrane transport 19 0 0.27 19 0.00124 0.00585 0.00124
30 GO:0006313 transposition, DNA-mediated 10 0 0.14 25 0.00782 0.00782 0.00782
31 GO:0007015 actin filament organization 31 0 0.44 225 0.17953 0.00802 0.01389
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term = Term(GOTERM[sigTerms]),
                 Definition = Definition(GOTERM[sigTerms]),
                 PValue=score(BP.weight01)[sigTerms]),
      caption = paste('Biological process terms significantly associated with',
                      VAR, 'according to all 3 algorithms', sep=' '))
Biological process terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0007186 G protein-coupled receptor signaling pathway A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). 0.0000000
GO:0006508 proteolysis The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. 0.0000000
GO:0006814 sodium ion transport The directed movement of sodium ions (Na+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0000000
GO:0006030 chitin metabolic process The chemical reactions and pathways involving chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. 0.0000000
GO:0006813 potassium ion transport The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0000001
GO:0034220 ion transmembrane transport A process in which an ion is transported across a membrane. 0.0000002
GO:0055085 transmembrane transport The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. 0.0000004
GO:0006486 protein glycosylation A protein modification process that results in the addition of a carbohydrate or carbohydrate derivative unit to a protein amino acid, e.g. the addition of glycan chains to proteins. 0.0000007
GO:0005975 carbohydrate metabolic process The chemical reactions and pathways involving carbohydrates, any of a group of organic compounds based of the general formula Cx(H2O)y. Includes the formation of carbohydrate derivatives by the addition of a carbohydrate residue to another molecule. 0.0000033
GO:0060271 cilium assembly The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. 0.0000076
GO:0015074 DNA integration The process in which a segment of DNA is incorporated into another, usually larger, DNA molecule such as a chromosome. 0.0000147
GO:0006811 ion transport The directed movement of charged atoms or small charged molecules into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0000263
GO:0007165 signal transduction The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal (e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light), or for signal transduction in the absence of ligand, signal-withdrawal or the activity of a constitutively active receptor. Signal transduction ends with regulation of a downstream cellular process, e.g. regulation of transcription or regulation of a metabolic process. Signal transduction covers signaling from receptors located on the surface of the cell and signaling via molecules located within the cell. For signaling between cells, signal transduction is restricted to events at and within the receiving cell. 0.0001347
GO:0007156 homophilic cell adhesion via plasma membrane adhesion molecules The attachment of a plasma membrane adhesion molecule in one cell to an identical molecule in an adjacent cell. 0.0001501
GO:0007160 cell-matrix adhesion The binding of a cell to the extracellular matrix via adhesion molecules. 0.0001997
GO:0003341 cilium movement The directed, self-propelled movement of a cilium. 0.0002568
GO:0070588 calcium ion transmembrane transport A process in which a calcium ion is transported from one side of a membrane to the other by means of some agent such as a transporter or pore. 0.0010143
GO:0005992 trehalose biosynthetic process The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. 0.0010957
GO:0007154 cell communication Any process that mediates interactions between a cell and its surroundings. Encompasses interactions such as signaling or attachment between one cell and another cell, between a cell and an extracellular matrix, or between a cell and any other aspect of its environment. 0.0013231
GO:0048870 cell motility Any process involved in the controlled self-propelled movement of a cell that results in translocation of the cell from one place to another. 0.0015760
GO:0007018 microtubule-based movement A microtubule-based process that results in the movement of organelles, other microtubules, or other cellular components. Examples include motor-driven movement along microtubules and movement driven by polymerization or depolymerization of microtubules. 0.0036411
GO:0071805 potassium ion transmembrane transport A process in which a potassium ion is transported from one side of a membrane to the other. 0.0058542
GO:0006313 transposition, DNA-mediated Any process involved in a type of transpositional recombination which occurs via a DNA intermediate. 0.0078178
GO:0070286 axonemal dynein complex assembly The aggregation, arrangement and bonding together of a set of components to form an axonemal dynein complex, a dynein complex found in eukaryotic cilia and flagella, in which the motor domain heads interact with adjacent microtubules to generate a sliding force which is converted to a bending motion. 0.0099394

I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.

showSigOfNodes(GOdata.BP, score(BP.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 155 
## Number of Edges = 276 
## 
## $complete.dag
## [1] "A graph with 155 nodes."

This is just a example of the most significant GO term:

showGroupDensity(GOdata.BP, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.BP, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.BP, orderedTerms[3], rm.one=FALSE)

Molecular function

orderedTerms <- names(sort(score(MF.weight01)))
significant.weight01 <- score(MF.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(MF.lea)[orderedTerms] <= 0.01
significant.elim     <- score(MF.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.tstat.sigTerms <- sigTerms

MF.all <- GenTable(GOdata.MF, elim=MF.elim, weight01=MF.weight01, lea=MF.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   MF.all[MF.all$Significant > MF.all$Expected,],
   caption = "Most over-represented terms of the Molecular Function ontology.")
Most over-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
4 GO:0008061 chitin binding 77 4 1.13 3 1.1e-09 1.1e-09 1.1e-09
6 GO:0004252 serine-type endopeptidase activity 88 4 1.29 6 2.3e-08 2.3e-08 2.3e-08
14 GO:0005200 structural constituent of cytoskeleton 17 2 0.25 13 0.00023 0.00023 0.00023
19 GO:0030248 cellulose binding 7 1 0.10 21 0.00090 0.00090 0.00090
20 GO:0020037 heme binding 95 3 1.40 23 0.00112 0.00112 0.00112
21 GO:0016715 oxidoreductase activity, acting on paire… 14 3 0.21 24 0.00123 0.00123 0.00123
23 GO:0031409 pigment binding 10 1 0.15 28 0.00172 0.00172 0.00172
24 GO:0030246 carbohydrate binding 51 2 0.75 39 0.00644 0.00184 0.00019
28 GO:0004601 peroxidase activity 34 2 0.50 31 0.00300 0.00300 2.8e-05
30 GO:0008233 peptidase activity 399 8 5.86 304 0.64934 0.00375 0.96640
39 GO:0004553 hydrolase activity, hydrolyzing O-glycos… 100 4 1.47 26 0.00164 0.00806 0.00164
42 GO:0008017 microtubule binding 62 1 0.91 47 0.00959 0.00959 0.00959
47 GO:0052689 carboxylic ester hydrolase activity 43 1 0.63 54 0.01154 0.01226 0.01154
kable(
   MF.all[MF.all$Significant < MF.all$Expected,],
   caption = "Most under-represented terms of the Molecular Function ontology.")
Most under-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0005509 calcium ion binding 361 3 5.30 1 < 1e-30 < 1e-30 < 1e-30
2 GO:0004930 G protein-coupled receptor activity 180 2 2.64 2 5.7e-23 9.3e-22 1.3e-26
3 GO:0043565 sequence-specific DNA binding 146 0 2.14 4 2.0e-09 1.4e-11 2.0e-09
5 GO:0005272 sodium channel activity 73 0 1.07 5 7.0e-09 7.0e-09 7.0e-09
7 GO:0003774 motor activity 131 1 1.92 7 1.3e-07 1.3e-07 2.9e-07
8 GO:0004181 metallocarboxypeptidase activity 19 0 0.28 8 2.2e-07 2.2e-07 2.2e-07
9 GO:0005515 protein binding 2101 30 30.86 17 0.00070 1.4e-06 0.00128
10 GO:0005249 voltage-gated potassium channel activity 27 0 0.40 9 1.7e-06 1.7e-06 1.7e-06
11 GO:0004888 transmembrane signaling receptor activit… 251 2 3.69 11 6.3e-06 6.1e-06 3.6e-30
12 GO:0008417 fucosyltransferase activity 33 0 0.48 12 8.8e-06 8.8e-06 8.8e-06
13 GO:0005230 extracellular ligand-gated ion channel a… 54 0 0.79 10 1.7e-06 1.0e-05 9.2e-08
15 GO:0004222 metalloendopeptidase activity 73 1 1.07 14 0.00031 0.00031 0.00031
16 GO:0005328 neurotransmitter:sodium symporter activi… 18 0 0.26 15 0.00055 0.00055 0.00055
17 GO:0051015 actin filament binding 30 0 0.44 16 0.00070 0.00070 0.00070
18 GO:0003777 microtubule motor activity 98 1 1.44 20 0.00085 0.00085 0.00085
22 GO:0004096 catalase activity 15 0 0.22 25 0.00156 0.00156 0.00156
25 GO:0004983 neuropeptide Y receptor activity 9 0 0.13 29 0.00188 0.00188 0.00188
26 GO:0015293 symporter activity 32 0 0.47 34 0.00447 0.00238 1.6e-05
27 GO:0008378 galactosyltransferase activity 23 0 0.34 30 0.00256 0.00256 0.00256
29 GO:0004190 aspartic-type endopeptidase activity 28 0 0.41 32 0.00306 0.00306 0.00306
31 GO:0008528 G protein-coupled peptide receptor activ… 15 0 0.22 33 0.00428 0.00428 1.6e-05
32 GO:0031683 G-protein beta/gamma-subunit complex bin… 11 0 0.16 37 0.00523 0.00523 0.00523
33 GO:0004089 carbonate dehydratase activity 11 0 0.16 38 0.00610 0.00610 0.00610
34 GO:0005452 inorganic anion exchanger activity 6 0 0.09 40 0.00676 0.00676 0.00676
35 GO:0070006 metalloaminopeptidase activity 5 0 0.07 41 0.00684 0.00684 0.00684
36 GO:0022848 acetylcholine-gated cation-selective cha… 16 0 0.24 42 0.00687 0.00687 0.00687
37 GO:0008138 protein tyrosine/serine/threonine phosph… 23 0 0.34 43 0.00749 0.00749 0.00749
38 GO:0030215 semaphorin receptor binding 7 0 0.10 44 0.00779 0.00779 0.00779
40 GO:0004435 phosphatidylinositol phospholipase C act… 6 0 0.09 46 0.00834 0.00834 0.00834
41 GO:0016757 transferase activity, transferring glyco… 194 2 2.85 335 0.71845 0.00903 0.01913
43 GO:0005216 ion channel activity 224 0 3.29 27 0.00169 0.01088 1.9e-18
44 GO:0019825 oxygen binding 8 0 0.12 51 0.01146 0.01146 0.01146
45 GO:0004970 ionotropic glutamate receptor activity 7 0 0.10 52 0.01146 0.01146 0.01146
46 GO:0019205 nucleobase-containing compound kinase ac… 26 0 0.38 35 0.00461 0.01193 0.00461
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(MF.weight01)[sigTerms]),
      caption = paste('Molecular function terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Molecular function terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0005509 calcium ion binding Interacting selectively and non-covalently with calcium ions (Ca2+). 0.0000000
GO:0004930 G protein-coupled receptor activity Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. 0.0000000
GO:0043565 sequence-specific DNA binding Interacting selectively and non-covalently with DNA of a specific nucleotide composition, e.g. GC-rich DNA binding, or with a specific sequence motif or type of DNA e.g. promotor binding or rDNA binding. 0.0000000
GO:0008061 chitin binding Interacting selectively and non-covalently with chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. 0.0000000
GO:0005272 sodium channel activity Enables the facilitated diffusion of a sodium ion (by an energy-independent process) involving passage through a transmembrane aqueous pore or channel without evidence for a carrier-mediated mechanism. 0.0000000
GO:0004252 serine-type endopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). 0.0000000
GO:0003774 motor activity Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. 0.0000001
GO:0004181 metallocarboxypeptidase activity Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0000002
GO:0005515 protein binding Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). 0.0000014
GO:0005249 voltage-gated potassium channel activity Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. 0.0000017
GO:0004888 transmembrane signaling receptor activity Combining with an extracellular or intracellular signal and transmitting the signal from one side of the membrane to the other to initiate a change in cell activity or state as part of signal transduction. 0.0000061
GO:0008417 fucosyltransferase activity Catalysis of the transfer of a fucosyl group to an acceptor molecule, typically another carbohydrate or a lipid. 0.0000088
GO:0005230 extracellular ligand-gated ion channel activity Enables the transmembrane transfer of an ion by a channel that opens when a specific extracellular ligand has been bound by the channel complex or one of its constituent parts. 0.0000100
GO:0005200 structural constituent of cytoskeleton The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. 0.0002305
GO:0004222 metalloendopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0003142
GO:0005328 neurotransmitter:sodium symporter activity Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: neurotransmitter(out) + Na+(out) = neurotransmitter(in) + Na+(in). 0.0005460
GO:0051015 actin filament binding Interacting selectively and non-covalently with an actin filament, also known as F-actin, a helical filamentous polymer of globular G-actin subunits. 0.0006982
GO:0003777 microtubule motor activity Catalysis of movement along a microtubule, coupled to the hydrolysis of a nucleoside triphosphate (usually ATP). 0.0008454
GO:0030248 cellulose binding Interacting selectively and non-covalently with cellulose. 0.0009042
GO:0020037 heme binding Interacting selectively and non-covalently with heme, any compound of iron complexed in a porphyrin (tetrapyrrole) ring. 0.0011192
GO:0016715 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen Catalysis of an oxidation-reduction (redox) reaction in which hydrogen or electrons are transferred from reduced ascorbate and one other donor, and one atom of oxygen is incorporated into one donor. 0.0012308
GO:0004096 catalase activity Catalysis of the reaction: 2 hydrogen peroxide = O2 + 2 H2O. 0.0015600
GO:0031409 pigment binding Interacting selectively and non-covalently with a pigment, any general or particular coloring matter in living organisms, e.g. melanin. 0.0017172
GO:0030246 carbohydrate binding Interacting selectively and non-covalently with any carbohydrate, which includes monosaccharides, oligosaccharides and polysaccharides as well as substances derived from monosaccharides by reduction of the carbonyl group (alditols), by oxidation of one or more hydroxy groups to afford the corresponding aldehydes, ketones, or carboxylic acids, or by replacement of one or more hydroxy group(s) by a hydrogen atom. Cyclitols are generally not regarded as carbohydrates. 0.0018445
GO:0004983 neuropeptide Y receptor activity Combining with neuropeptide Y to initiate a change in cell activity. 0.0018782
GO:0015293 symporter activity Enables the active transport of a solute across a membrane by a mechanism whereby two or more species are transported together in the same direction in a tightly coupled process not directly linked to a form of energy other than chemiosmotic energy. 0.0023817
GO:0008378 galactosyltransferase activity Catalysis of the transfer of a galactosyl group to an acceptor molecule, typically another carbohydrate or a lipid. 0.0025622
GO:0004601 peroxidase activity Catalysis of the reaction: donor + hydrogen peroxide = oxidized donor + 2 H2O. 0.0030028
GO:0004190 aspartic-type endopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which a water molecule bound by the side chains of aspartic residues at the active center acts as a nucleophile. 0.0030648
GO:0008528 G protein-coupled peptide receptor activity Combining with a peptide and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. 0.0042829
GO:0031683 G-protein beta/gamma-subunit complex binding Interacting selectively and non-covalently with a complex of G-protein beta/gamma subunits. 0.0052275
GO:0004089 carbonate dehydratase activity Catalysis of the reaction: H2CO3 = CO2 + H2O. 0.0060993
GO:0005452 inorganic anion exchanger activity Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: inorganic anion A(out) + inorganic anion B(in) = inorganic anion A(in) + inorganic anion B(out). 0.0067585
GO:0070006 metalloaminopeptidase activity Catalysis of the hydrolysis of N-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0068381
GO:0022848 acetylcholine-gated cation-selective channel activity Selectively enables the transmembrane transfer of a cation by a channel that opens upon binding acetylcholine. 0.0068744
GO:0008138 protein tyrosine/serine/threonine phosphatase activity Catalysis of the reactions: protein serine + H2O = protein serine + phosphate; protein threonine phosphate + H2O = protein threonine + phosphate; and protein tyrosine phosphate + H2O = protein tyrosine + phosphate. 0.0074915
GO:0030215 semaphorin receptor binding Interacting selectively and non-covalently with semaphorin receptors. 0.0077865
GO:0004553 hydrolase activity, hydrolyzing O-glycosyl compounds Catalysis of the hydrolysis of any O-glycosyl bond. 0.0080587
GO:0004435 phosphatidylinositol phospholipase C activity Catalysis of the reaction: 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate + H(2)O = 1,2-diacylglycerol + 1D-myo-inositol 1,4,5-trisphosphate + H(+). 0.0083379
GO:0008017 microtubule binding Interacting selectively and non-covalently with microtubules, filaments composed of tubulin monomers. 0.0095925
showSigOfNodes(GOdata.MF, score(MF.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 149 
## Number of Edges = 194 
## 
## $complete.dag
## [1] "A graph with 149 nodes."
showGroupDensity(GOdata.MF, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.MF, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.MF, orderedTerms[3], rm.one=FALSE)

Cellular component

orderedTerms <- names(sort(score(CC.weight01)))
significant.weight01 <- score(CC.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(CC.lea)[orderedTerms] <= 0.01
significant.elim     <- score(CC.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.tstat.sigTerms <- sigTerms

CC.all <- GenTable(GOdata.CC, elim=CC.elim, weight01=CC.weight01, lea=CC.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
   CC.all[CC.all$Significant > CC.all$Expected,],
   caption = "Most over-represented terms of the Cellular Component ontology.")
Most over-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0016021 integral component of membrane 885 16 11.70 1 < 1e-30 < 1e-30 < 1e-30
2 GO:0016020 membrane 1565 23 20.68 3 6.9e-12 1.5e-19 < 1e-30
3 GO:0005576 extracellular region 140 6 1.85 2 4.1e-15 7.5e-14 1.2e-17
10 GO:0005874 microtubule 29 2 0.38 11 0.00221 0.00221 0.00221
12 GO:0042555 MCM complex 10 4 0.13 14 0.00302 0.00302 0.00302
17 GO:0005887 integral component of plasma membrane 118 2 1.56 13 0.00293 0.00713 0.00293
kable(
   CC.all[CC.all$Significant < CC.all$Expected,],
   caption = "Most under-represented terms of the Cellular Component ontology.")
Most under-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
4 GO:0016459 myosin complex 33 0 0.44 4 5.2e-08 5.2e-08 5.2e-08
5 GO:0008076 voltage-gated potassium channel complex 13 0 0.17 5 3.2e-06 3.2e-06 3.2e-06
6 GO:0005886 plasma membrane 170 2 2.25 6 0.00019 0.00016 2.1e-12
7 GO:0098797 plasma membrane protein complex 34 0 0.45 8 0.00033 0.00049 3.7e-09
8 GO:0031012 extracellular matrix 9 0 0.12 9 0.00081 0.00081 0.00081
9 GO:0034704 calcium channel complex 5 0 0.07 10 0.00185 0.00185 0.00185
11 GO:0005929 cilium 27 0 0.36 12 0.00279 0.00279 9.8e-08
13 GO:0005858 axonemal dynein complex 5 0 0.07 15 0.00371 0.00371 0.00371
14 GO:0045211 postsynaptic membrane 16 0 0.21 16 0.00428 0.00428 0.00428
15 GO:0031225 anchored component of membrane 6 0 0.08 17 0.00588 0.00588 0.00588
16 GO:0044430 cytoskeletal part 152 2 2.01 41 0.05708 0.00594 2.7e-08
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(CC.weight01)[sigTerms]),
      caption = paste('Cellular component terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Cellular component terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0016021 integral component of membrane The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0000000
GO:0016020 membrane A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. 0.0000000
GO:0005576 extracellular region The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. 0.0000000
GO:0016459 myosin complex A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. 0.0000001
GO:0008076 voltage-gated potassium channel complex A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. 0.0000032
GO:0005886 plasma membrane The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. 0.0001593
GO:0098797 plasma membrane protein complex Any protein complex that is part of the plasma membrane. 0.0004918
GO:0031012 extracellular matrix A structure lying external to one or more cells, which provides structural support, biochemical or biomechanical cues for cells or tissues. 0.0008138
GO:0034704 calcium channel complex An ion channel complex through which calcium ions pass. 0.0018495
GO:0005874 microtubule Any of the long, generally straight, hollow tubes of internal diameter 12-15 nm and external diameter 24 nm found in a wide variety of eukaryotic cells; each consists (usually) of 13 protofilaments of polymeric tubulin, staggered in such a manner that the tubulin monomers are arranged in a helical pattern on the microtubular surface, and with the alpha/beta axes of the tubulin subunits parallel to the long axis of the tubule; exist in equilibrium with pool of tubulin monomers and can be rapidly assembled or disassembled in response to physiological stimuli; concerned with force generation, e.g. in the spindle. 0.0022146
GO:0005929 cilium A specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface and of some cytoplasmic parts. Each cilium is largely bounded by an extrusion of the cytoplasmic (plasma) membrane, and contains a regular longitudinal array of microtubules, anchored to a basal body. 0.0027889
GO:0042555 MCM complex A hexameric protein complex required for the initiation and regulation of DNA replication. 0.0030211
GO:0005858 axonemal dynein complex A dynein complex found in eukaryotic cilia and flagella; the motor domain heads interact with adjacent microtubules to generate a sliding force which is converted to a bending motion. 0.0037087
GO:0045211 postsynaptic membrane A specialized area of membrane facing the presynaptic membrane on the tip of the nerve ending and separated from it by a minute cleft (the synaptic cleft). Neurotransmitters cross the synaptic cleft and transmit the signal to the postsynaptic membrane. 0.0042845
GO:0031225 anchored component of membrane The component of a membrane consisting of the gene products that are tethered to the membrane only by a covalently attached anchor, such as a lipid group that is embedded in the membrane. Gene products with peptide sequences that are embedded in the membrane are excluded from this grouping. 0.0058784
GO:0005887 integral component of plasma membrane The component of the plasma membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0071259
showSigOfNodes(GOdata.CC, score(CC.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 68 
## Number of Edges = 117 
## 
## $complete.dag
## [1] "A graph with 68 nodes."
showGroupDensity(GOdata.CC, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.CC, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.CC, orderedTerms[3], rm.one=FALSE)

Comparison between the two ordering of genes

I want to compare these results with those from 2020-01-14, where I used the \(p\) value of the differential expression analysis to order the genes. I can import the data from 2020-01-14, but I need to do it in a new environment to prevent overwriting the current results.

load(ENRICHMENT, ex <- new.env())

Biological process

allTerms <- usedGO(GOdata.BP)
BP.pvalue.sigTerms <- with(ex, BP.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat  = allTerms %in% BP.tstat.sigTerms,
                             PValue = allTerms %in% BP.pvalue.sigTerms)))

# New terms:
as.data.frame(Term(GOTERM[setdiff(BP.tstat.sigTerms, BP.pvalue.sigTerms)]))
##               Term(GOTERM[setdiff(BP.tstat.sigTerms, BP.pvalue.sigTerms)])
## GO:0006814                                            sodium ion transport
## GO:0006030                                        chitin metabolic process
## GO:0006486                                           protein glycosylation
## GO:0005975                                  carbohydrate metabolic process
## GO:0015074                                                 DNA integration
## GO:0006811                                                   ion transport
## GO:0007156 homophilic cell adhesion via plasma membrane adhesion molecules
## GO:0070588                             calcium ion transmembrane transport
## GO:0007154                                              cell communication
## GO:0048870                                                   cell motility
## GO:0007018                                      microtubule-based movement
## GO:0071805                           potassium ion transmembrane transport
## GO:0006313                                     transposition, DNA-mediated
## GO:0070286                                axonemal dynein complex assembly
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(BP.pvalue.sigTerms, BP.tstat.sigTerms)]))
##                 Term(GOTERM[setdiff(BP.pvalue.sigTerms, BP.tstat.sigTerms)])
## GO:0006468                                           protein phosphorylation
## GO:0055114                                       oxidation-reduction process
## GO:0006289                                        nucleotide-excision repair
## GO:0006355                        regulation of transcription, DNA-templated
## GO:0006470                                         protein dephosphorylation
## GO:0006979                                      response to oxidative stress
## GO:1901642                                nucleoside transmembrane transport
## GO:0043161 proteasome-mediated ubiquitin-dependent protein catabolic process
## GO:0046942                                         carboxylic acid transport
BPp.weight01 <- with(ex, BP.weight01)
ggplot(data = data.frame(TStat  = rank(score(BP.weight01))[allTerms],
                         PValue = rank(score(BPp.weight01))[allTerms]),
       mapping = aes(x = TStat, y = PValue)) +
   geom_point() + geom_smooth() + xlab('Using gene t statistics') +
   ylab('Using gene p values') +
   ggtitle('Ordering of BP terms by significance')

Molecular function

allTerms <- usedGO(GOdata.MF)
MF.pvalue.sigTerms <- with(ex, MF.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat  = allTerms %in% MF.tstat.sigTerms,
                             PValue = allTerms %in% MF.pvalue.sigTerms)))

# New terms:
as.data.frame(Term(GOTERM[setdiff(MF.tstat.sigTerms, MF.pvalue.sigTerms)]))
##            Term(GOTERM[setdiff(MF.tstat.sigTerms, MF.pvalue.sigTerms)])
## GO:0043565                                sequence-specific DNA binding
## GO:0008061                                               chitin binding
## GO:0005272                                      sodium channel activity
## GO:0004888                    transmembrane signaling receptor activity
## GO:0008417                                  fucosyltransferase activity
## GO:0005230              extracellular ligand-gated ion channel activity
## GO:0005328                   neurotransmitter:sodium symporter activity
## GO:0051015                                       actin filament binding
## GO:0003777                                   microtubule motor activity
## GO:0020037                                                 heme binding
## GO:0004096                                            catalase activity
## GO:0031409                                              pigment binding
## GO:0030246                                         carbohydrate binding
## GO:0004983                             neuropeptide Y receptor activity
## GO:0015293                                           symporter activity
## GO:0008378                               galactosyltransferase activity
## GO:0004601                                          peroxidase activity
## GO:0004190                         aspartic-type endopeptidase activity
## GO:0008528                  G protein-coupled peptide receptor activity
## GO:0031683                 G-protein beta/gamma-subunit complex binding
## GO:0004089                               carbonate dehydratase activity
## GO:0005452                           inorganic anion exchanger activity
## GO:0070006                               metalloaminopeptidase activity
## GO:0022848        acetylcholine-gated cation-selective channel activity
## GO:0030215                                  semaphorin receptor binding
## GO:0004553         hydrolase activity, hydrolyzing O-glycosyl compounds
## GO:0004435                phosphatidylinositol phospholipase C activity
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(MF.pvalue.sigTerms, MF.tstat.sigTerms)]))
##                Term(GOTERM[setdiff(MF.pvalue.sigTerms, MF.tstat.sigTerms)])
## GO:0005525                                                      GTP binding
## GO:0004672                                          protein kinase activity
## GO:0005524                                                      ATP binding
## GO:0005507                                               copper ion binding
## GO:0061630                                ubiquitin protein ligase activity
## GO:0004198           calcium-dependent cysteine-type endopeptidase activity
## GO:0003924                                                  GTPase activity
## GO:0042626 ATPase activity, coupled to transmembrane movement of substances
## GO:0005337                    nucleoside transmembrane transporter activity
## GO:0016787                                               hydrolase activity
## GO:0016840                                   carbon-nitrogen lyase activity
## GO:0047631                                ADP-ribose diphosphatase activity
## GO:0005506                                                 iron ion binding
MFp.weight01 <- with(ex, MF.weight01)
ggplot(data = data.frame(TStat  = rank(score(MF.weight01))[allTerms],
                         PValue = rank(score(MFp.weight01))[allTerms]),
       mapping = aes(x = TStat, y = PValue)) +
   geom_point() + geom_smooth() + xlab('Using gene t statistics') +
   ylab('Using gene p values') +
   ggtitle('Ordering of MF terms by significance')

Cellular component

allTerms <- usedGO(GOdata.CC)
CC.pvalue.sigTerms <- with(ex, CC.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat  = allTerms %in% CC.tstat.sigTerms,
                             PValue = allTerms %in% CC.pvalue.sigTerms)))

# New terms:
as.data.frame(Term(GOTERM[setdiff(CC.tstat.sigTerms, CC.pvalue.sigTerms)]))
##            Term(GOTERM[setdiff(CC.tstat.sigTerms, CC.pvalue.sigTerms)])
## GO:0098797                              plasma membrane protein complex
## GO:0031012                                         extracellular matrix
## GO:0034704                                      calcium channel complex
## GO:0005874                                                  microtubule
## GO:0005929                                                       cilium
## GO:0005858                                      axonemal dynein complex
## GO:0045211                                        postsynaptic membrane
## GO:0031225                               anchored component of membrane
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(CC.pvalue.sigTerms, CC.tstat.sigTerms)]))
##            Term(GOTERM[setdiff(CC.pvalue.sigTerms, CC.tstat.sigTerms)])
## GO:0098800                 inner mitochondrial membrane protein complex
## GO:0090575               RNA polymerase II transcription factor complex
CCp.weight01 <- with(ex, CC.weight01)
ggplot(data = data.frame(TStat  = rank(score(CC.weight01))[allTerms],
                         PValue = rank(score(CCp.weight01))[allTerms]),
       mapping = aes(x = TStat, y = PValue)) +
   geom_point() + geom_smooth() + xlab('Using gene t statistics') +
   ylab('Using gene p values') +
   ggtitle('Ordering of CC terms by significance')

As expected, ordering genes by the \(t\) statistic produces results different from those when ordering genes by \(p\) value. Some GO terms enriched when using \(p\) values are missed when using \(t\) statistic, typically because those terms are assigned to genes both sub- and overexpressed. See for example the case of the protein phosphorylation process:

showGroupDensity(GOdata.BP, 'GO:0006468', rm.one=FALSE)

Both the genes annotated with that function and their complement display a snake-shaped, bimodal distribution, the former being more extreme. It is not irrelevant, but more difficult to interprete than a term being significant because all their genes behave in a similar way.

Session info

save(allgenes2GO,
     GOdata.BP, BP.elim, BP.weight01, BP.lea, BP.pvalue.sigTerms,
     GOdata.MF, MF.elim, MF.weight01, MF.lea, MF.pvalue.sigTerms,
     GOdata.CC, CC.elim, CC.weight01, CC.lea, CC.pvalue.sigTerms,
     file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] Rgraphviz_2.30.0         variancePartition_1.16.1 scales_1.1.0            
##  [4] foreach_1.5.0            ggplot2_3.3.0            limma_3.42.2            
##  [7] knitr_1.28               topGO_2.38.1             SparseM_1.78            
## [10] GO.db_3.10.0             AnnotationDbi_1.48.0     IRanges_2.20.2          
## [13] S4Vectors_0.24.4         Biobase_2.46.0           graph_1.64.0            
## [16] BiocGenerics_0.32.0     
## 
## loaded via a namespace (and not attached):
##  [1] bit64_0.9-7         splines_3.6.3       gtools_3.8.2       
##  [4] assertthat_0.2.1    statmod_1.4.34      highr_0.8          
##  [7] blob_1.2.1          yaml_2.2.1          progress_1.2.2     
## [10] pillar_1.4.3        RSQLite_2.2.0       lattice_0.20-41    
## [13] glue_1.4.0          digest_0.6.25       minqa_1.2.4        
## [16] colorspace_1.4-1    htmltools_0.4.0     Matrix_1.2-18      
## [19] plyr_1.8.6          pkgconfig_2.0.3     purrr_0.3.4        
## [22] gdata_2.18.0        BiocParallel_1.20.1 lme4_1.1-23        
## [25] tibble_3.0.1        mgcv_1.8-31         farver_2.0.3       
## [28] ellipsis_0.3.0      withr_2.2.0         pbkrtest_0.4-8.6   
## [31] magrittr_1.5        crayon_1.3.4        memoise_1.1.0      
## [34] evaluate_0.14       doParallel_1.0.15   nlme_3.1-147       
## [37] MASS_7.3-51.6       gplots_3.0.3        tools_3.6.3        
## [40] prettyunits_1.1.1   hms_0.5.3           lifecycle_0.2.0    
## [43] matrixStats_0.56.0  stringr_1.4.0       munsell_0.5.0      
## [46] colorRamps_2.3      compiler_3.6.3      caTools_1.18.0     
## [49] rlang_0.4.6         nloptr_1.2.2.1      iterators_1.0.12   
## [52] labeling_0.3        bitops_1.0-6        rmarkdown_2.1      
## [55] boot_1.3-25         gtable_0.3.0        codetools_0.2-16   
## [58] DBI_1.1.0           reshape2_1.4.4      R6_2.4.1           
## [61] dplyr_0.8.5         bit_1.1-15.2        KernSmooth_2.23-17 
## [64] stringi_1.4.6       Rcpp_1.0.4.6        vctrs_0.2.4        
## [67] tidyselect_1.0.0    xfun_0.13

Alexa, Adrian, and Jorg Rahnenfuhrer. 2019. TopGO: Enrichment Analysis for Gene Ontology.

Subramanian, A., P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, et al. n.d. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences of the U.s.a. 102 (43): 15545–50. doi:10.1073兾pnas.0506580102.